This is a quick walkthrough demonstrating how to project new scRNA-seq data onto an existing SWNE embedding. We’ll be using a human cortical neuron dataset generated using the snDropSeq technology as our training dataset. The pre-computed snDropSeq Seurat object can be found here. We’ll be projecting data generated using a much older technology, the C1 microfluidic platform. The C1 data contains the same cortical neuron cell types, but from a different patient (and a different scRNA-seq technology). The pre-computed C1 Seurat object can be found here
Load the required libraries
library(Seurat)
library(swne)
First, we’ll be constructing an SWNE embedding for the snDropSeq training data
Load the snDropSeq training data, normalize the counts matrix, and extract the overdispersed genes
train.obj <- readRDS("~/swne/Data/snDropSeq_cortical_neurons.Robj")
train.norm.counts <- ExtractNormCounts(train.obj, obj.type = "seurat", rescale = T)
## calculating variance fit ... using gam
var.genes <- train.obj@var.genes
Extract training data cell types
train.clusters <- train.obj@ident
names(train.clusters) <- train.obj@cell.names
Run NMF and project features
n.cores <- 12
k <- 30
train.nmf.res <- RunNMF(train.norm.counts[var.genes,], k = k, init = "ica", n.cores = n.cores, ica.fast = T)
train.nmf.res$W <- ProjectFeatures(train.norm.counts, train.nmf.res$H, n.cores = n.cores)
train.nmf.scores <- train.nmf.res$H
Build the Shared Nearest Neighbors network (SNN)
train.obj <- BuildSNN(train.obj, dims.use = 1:20, k.param = 20, prune.SNN = 1/20)
Run SWNE embedding and hide the factors
train.embedding <- EmbedSWNE(train.nmf.scores, train.obj@snn, alpha.exp = 1.25, snn.exp = 1,
n_pull = 3, proj.method = "sammon", dist.use = "cosine")
## Initial stress : 0.37540
## stress after 5 iters: 0.20183
train.embedding$H.coords$name <- "" ## Hide factors
Embed key neuronal markers
genes.embed <- c("CBLN2", "NRGN", "GRIK1", "NTNG1", "DAB1", "DCC", "POSTN")
train.embedding <- EmbedFeatures(train.embedding, train.nmf.res$W, genes.embed, n_pull = 3, scale.cols = F)
Make SWNE plot of training data
plot.seed <- 124532
PlotSWNE(train.embedding, alpha.plot = 0.3, sample.groups = train.clusters, do.label = T,
label.size = 3.5, pt.size = 0.75, show.legend = F, seed = plot.seed)
Now we can project the C1 test datset. First let’s load the Seurat object.
test.obj <- readRDS("~/swne/Data/C1_cortical_neurons.Robj")
test.norm.counts <- ExtractNormCounts(test.obj, obj.type = "seurat", rescale = T, rescale.method = "log")
## calculating variance fit ... using gam
Extract the test dataset cell types
test.clusters <- test.obj@ident
names(test.clusters) <- test.obj@cell.names; levels(test.clusters);
## [1] "Ex1" "Ex2" "Ex3" "Ex4" "Ex5" "Ex6" "Ex7" "Ex8" "In1" "In2" "In3"
## [12] "In4" "In5" "In6" "In7" "In8"
Match the test dataset cell types to the training dataset cell types
test.clusters <- plyr::revalue(test.clusters, replace =
c("Ex1" = "Ex_L2/3", "Ex2" = "Ex_L4", "Ex3" = "Ex_L4",
"Ex4" = "Ex_L4/5", "Ex5" = "Ex_L5", "Ex6" = "Ex_L6",
"Ex7" = "Ex_L6", "Ex8" = "Ex_L6b",
"In7" = "In7/8", "In8" = "In7/8"))
Project the test gene expression matrix onto the training dataset NMFs
genes.project <- intersect(var.genes, rownames(test.norm.counts))
test.nmf.scores <- ProjectSamples(test.norm.counts, train.nmf.res$W, features.use = genes.project, n.cores = n.cores)
Project the test data onto the training SNN
test.snn <- ProjectSNN(test.norm.counts, train.norm.counts, n.pcs = 30, features.use = genes.project, k = 20,
print.output = F)
## [1] "Running PCA and test data projection"
test.embedding <- ProjectSWNE(train.embedding, test.nmf.scores, SNN = test.snn,
alpha.exp = 1.25, snn.exp = 0.25, n_pull = 3)
Make sure we’re using the same cluster colors
cluster.colors <- ExtractSWNEColors(train.embedding, train.clusters, seed = plot.seed)
cluster.colors <- cluster.colors[grepl("Ex|In", names(cluster.colors))]
cluster.colors[["In5"]] <- "#00BFFF"
Plot the test data projected onto the training SWNE embedding. Note how the test data cell types map onto the same spatial locations as the corresponding training data cell types
PlotSWNE(test.embedding, alpha.plot = 0.5, sample.groups = test.clusters, do.label = T,
pt.size = 1.5, show.legend = F, seed = plot.seed) +
scale_color_manual(values = cluster.colors)
## Scale for 'colour' is already present. Adding another scale for
## 'colour', which will replace the existing scale.